<!DOCTYPE html>
<html>

<head>

<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>jandan pic v2</title>


<style type="text/css">
*{margin:0;padding:0;}
body {
	font:13.34px helvetica,arial,freesans,clean,sans-serif;
	color:black;
	line-height:1.4em;
	background-color: #F8F8F8;
	padding: 0.7em;
}
p {
	margin:1em 0;
	line-height:1.5em;
}
table {
	font-size:inherit;
	font:100%;
	margin:1em;
}
table th{border-bottom:1px solid #bbb;padding:.2em 1em;}
table td{border-bottom:1px solid #ddd;padding:.2em 1em;}
input[type=text],input[type=password],input[type=image],textarea{font:99% helvetica,arial,freesans,sans-serif;}
select,option{padding:0 .25em;}
optgroup{margin-top:.5em;}
pre,code{font:12px Monaco,"Courier New","DejaVu Sans Mono","Bitstream Vera Sans Mono",monospace;}
pre {
	margin:1em 0;
	font-size:12px;
	background-color:#eee;
	border:1px solid #ddd;
	padding:5px;
	line-height:1.5em;
	color:#444;
	overflow:auto;
	-webkit-box-shadow:rgba(0,0,0,0.07) 0 1px 2px inset;
	-webkit-border-radius:3px;
	-moz-border-radius:3px;border-radius:3px;
}
pre code {
	padding:0;
	font-size:12px;
	background-color:#eee;
	border:none;
}
code {
	font-size:12px;
	background-color:#f8f8ff;
	color:#444;
	padding:0 .2em;
	border:1px solid #dedede;
}
img{border:0;max-width:100%;}
abbr{border-bottom:none;}
a{color:#4183c4;text-decoration:none;}
a:hover{text-decoration:underline;}
a code,a:link code,a:visited code{color:#4183c4;}
h2,h3{margin:1em 0;}
h1,h2,h3,h4,h5,h6{border:0;}
h1{font-size:170%;border-top:4px solid #aaa;padding-top:.5em;margin-top:1.5em;}
h1:first-child{margin-top:0;padding-top:.25em;border-top:none;}
h2{font-size:150%;margin-top:1.5em;border-top:4px solid #e0e0e0;padding-top:.5em;}
h3{margin-top:1em;}
hr{border:1px solid #ddd;}
ul{margin:1em 0 1em 2em;}
ol{margin:1em 0 1em 2em;}
ul li,ol li{margin-top:.5em;margin-bottom:.5em;}
ul ul,ul ol,ol ol,ol ul{margin-top:0;margin-bottom:0;}
blockquote{margin:1em 0;border-left:5px solid #ddd;padding-left:.6em;color:#555;}
dt{font-weight:bold;margin-left:1em;}
dd{margin-left:2em;margin-bottom:1em;}
sup {
    font-size: 0.83em;
    vertical-align: super;
    line-height: 0;
}
kbd {
  display: inline-block;padding: 3px 5px;font-size: 11px;line-height: 10px;color: #555;vertical-align: middle;background-color: #fcfcfc;border: solid 1px #ccc;border-bottom-color: #bbb;border-radius: 3px;box-shadow: inset 0 -1px 0 #bbb;
}
* {
	-webkit-print-color-adjust: exact;
}
@media screen and (min-width: 914px) {
    body {
        width: 854px;
        margin:0 auto;
    }
}
@media print {
	table, pre {
		page-break-inside: avoid;
	}
	pre {
		word-wrap: break-word;
	}
}
</style>

<style type="text/css">
/**
 * prism.js default theme for JavaScript, CSS and HTML
 * Based on dabblet (http://dabblet.com)
 * @author Lea Verou
 */

code[class*="language-"],
pre[class*="language-"] {
	color: black;
	background: none;
	text-shadow: 0 1px white;
	font-family: Consolas, Monaco, 'Andale Mono', 'Ubuntu Mono', monospace;
	text-align: left;
	white-space: pre;
	word-spacing: normal;
	word-break: normal;
	word-wrap: normal;
	line-height: 1.5;

	-moz-tab-size: 4;
	-o-tab-size: 4;
	tab-size: 4;

	-webkit-hyphens: none;
	-moz-hyphens: none;
	-ms-hyphens: none;
	hyphens: none;
}

pre[class*="language-"]::-moz-selection, pre[class*="language-"] ::-moz-selection,
code[class*="language-"]::-moz-selection, code[class*="language-"] ::-moz-selection {
	text-shadow: none;
	background: #b3d4fc;
}

pre[class*="language-"]::selection, pre[class*="language-"] ::selection,
code[class*="language-"]::selection, code[class*="language-"] ::selection {
	text-shadow: none;
	background: #b3d4fc;
}

@media print {
	code[class*="language-"],
	pre[class*="language-"] {
		text-shadow: none;
	}
}

/* Code blocks */
pre[class*="language-"] {
	padding: 1em;
	margin: .5em 0;
	overflow: auto;
}

:not(pre) > code[class*="language-"],
pre[class*="language-"] {
	background: #f5f2f0;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
	padding: .1em;
	border-radius: .3em;
	white-space: normal;
}

.token.comment,
.token.prolog,
.token.doctype,
.token.cdata {
	color: slategray;
}

.token.punctuation {
	color: #999;
}

.namespace {
	opacity: .7;
}

.token.property,
.token.tag,
.token.boolean,
.token.number,
.token.constant,
.token.symbol,
.token.deleted {
	color: #905;
}

.token.selector,
.token.attr-name,
.token.string,
.token.char,
.token.builtin,
.token.inserted {
	color: #690;
}

.token.operator,
.token.entity,
.token.url,
.language-css .token.string,
.style .token.string {
	color: #a67f59;
	background: hsla(0, 0%, 100%, .5);
}

.token.atrule,
.token.attr-value,
.token.keyword {
	color: #07a;
}

.token.function {
	color: #DD4A68;
}

.token.regex,
.token.important,
.token.variable {
	color: #e90;
}

.token.important,
.token.bold {
	font-weight: bold;
}
.token.italic {
	font-style: italic;
}

.token.entity {
	cursor: help;
}
</style>


</head>

<body>

<p>参考代码：<a href="http://git.oschina.net/crossin/crawler/blob/master/ex2_jandan_pic_v2.py" target="_blank">http://git.oschina.net/crossin/crawler/blob/master/ex2_jandan_pic_v2.py</a></p>

<h2 id="toc_0">1. BeautifulSoup</h2>

<p>BeautifulSoup 是一个可以从 HTML 或 XML 文件中提取数据的 Python 库。它能够通过你喜欢的转换器实现惯用的文档导航、查找、修改文档的方式。BeautifulSoup 会帮你节省数小时甚至数天的工作时间.</p>

<h3 id="toc_1">安装</h3>

<p>通过 <code>pip</code> 安装（推荐）：</p>

<div><pre><code class="language-none">pip install beautifulsoup4</code></pre></div>

<p>下载 <a href="http://www.crummy.com/software/BeautifulSoup/download/4.x/" target="_blank">bs4源码</a>，通过命令安装：</p>

<div><pre><code class="language-none">python setup.py install</code></pre></div>

<h3 id="toc_2">如何使用</h3>

<p>将一段文档传入 BeautifulSoup 的构造方法，就能得到一个文档的对象。 可以传入一段字符串或文件对象。</p>

<div><pre><code class="language-python">from bs4 import BeautifulSoup
soup = BeautifulSoup(&quot;&lt;html&gt;data&lt;/html&gt;&quot;)
soup = BeautifulSoup(open(&quot;index.html&quot;))</code></pre></div>

<p>首先，文档被转换成Unicode。然后，BeautifulSoup 通过解析器来解析这段文档。</p>

<h3 id="toc_3">示例</h3>

<p>以 <em>爱丽丝梦游仙境</em> 的一段内容作为示例。</p>

<div><pre><code class="language-python">html_doc = &quot;&quot;&quot;
&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse&#39;s story&lt;/title&gt;&lt;/head&gt;
&lt;body&gt;
&lt;p class=&quot;title&quot;&gt;&lt;b&gt;The Dormouse&#39;s story&lt;/b&gt;&lt;/p&gt;

&lt;p class=&quot;story&quot;&gt;Once upon a time there were three little sisters; and their names were
&lt;a href=&quot;http://example.com/elsie&quot; class=&quot;sister&quot; id=&quot;link1&quot;&gt;Elsie&lt;/a&gt;,
&lt;a href=&quot;http://example.com/lacie&quot; class=&quot;sister&quot; id=&quot;link2&quot;&gt;Lacie&lt;/a&gt; and
&lt;a href=&quot;http://example.com/tillie&quot; class=&quot;sister&quot; id=&quot;link3&quot;&gt;Tillie&lt;/a&gt;;
and they lived at the bottom of a well.&lt;/p&gt;

&lt;p class=&quot;story&quot;&gt;...&lt;/p&gt;
&quot;&quot;&quot;</code></pre></div>

<p>创建 BeautifulSoup 的对象，并按格式输出：</p>

<div><pre><code class="language-python">from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, &#39;html.parser&#39;)

print(soup.prettify())
# &lt;html&gt;
#  &lt;head&gt;
#   &lt;title&gt;
#    The Dormouse&#39;s story
#   &lt;/title&gt;
#  &lt;/head&gt;
#  &lt;body&gt;
#    ...
#  &lt;/body&gt;
# &lt;/html&gt;</code></pre></div>

<p>获取文档中的数据：</p>

<div><pre><code class="language-python">soup.title
# &lt;title&gt;The Dormouse&#39;s story&lt;/title&gt;

soup.title.name
# u&#39;title&#39;

soup.title.string
# u&#39;The Dormouse&#39;s story&#39;

soup.title.parent.name
# u&#39;head&#39;

soup.p
# &lt;p class=&quot;title&quot;&gt;&lt;b&gt;The Dormouse&#39;s story&lt;/b&gt;&lt;/p&gt;

soup.p[&#39;class&#39;]
# u&#39;title&#39;

soup.a
# &lt;a class=&quot;sister&quot; href=&quot;http://example.com/elsie&quot; id=&quot;link1&quot;&gt;Elsie&lt;/a&gt;

soup.find_all(&#39;a&#39;)
# [&lt;a class=&quot;sister&quot; href=&quot;http://example.com/elsie&quot; id=&quot;link1&quot;&gt;Elsie&lt;/a&gt;,
#  &lt;a class=&quot;sister&quot; href=&quot;http://example.com/lacie&quot; id=&quot;link2&quot;&gt;Lacie&lt;/a&gt;,
#  &lt;a class=&quot;sister&quot; href=&quot;http://example.com/tillie&quot; id=&quot;link3&quot;&gt;Tillie&lt;/a&gt;]

soup.find(id=&quot;link3&quot;)
# &lt;a class=&quot;sister&quot; href=&quot;http://example.com/tillie&quot; id=&quot;link3&quot;&gt;Tillie&lt;/a&gt;</code></pre></div>

<p>从文档中找到所有 class 为 sister 的 &lt;a&gt; 标签的链接:</p>

<div><pre><code class="language-python">for link in soup.find_all(&#39;a&#39;, class_=&#39;sister&#39;):
    print(link.get(&#39;href&#39;))
    # http://example.com/elsie
    # http://example.com/lacie
    # http://example.com/tillie</code></pre></div>

<h3 id="toc_4">参考资料</h3>

<ol>
<li><a href="http://beautifulsoup.readthedocs.io/zh_CN/latest/" target="_blank">BeautifulSoup 4.4.0 中文文档</a></li>
<li><a href="http://cuiqingcai.com/1319.html" target="_blank">Python爬虫利器二之Beautiful Soup的用法 - 崔庆才</a></li>
</ol>

<h2 id="toc_5">2. python shell</h2>

<p>除了在编辑器中完成代码再执行，Python 也提供了 <strong>交互式的带提示符的解释器</strong>，即写一行代码，就可以立刻被运行，然后方便查看到结果，进行调试。这就是我们说的 python shell。</p>

<p>python shell 通常都用于调试和测试代码，而不适合正式开发程序。</p>

<h3 id="toc_6">从命令行中运行</h3>

<p>当你安装好 python，并正确配置系统变量 PATH 后（linux 和 mac 上通常都预装并配置好了 python），在命令行里输入 python，会看到诸如以下的提示：</p>

<div><pre><code class="language-none">$ python
Python 3.6.1 (default, Mar 23 2017, 14:49:01)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.
&gt;&gt;&gt;</code></pre></div>

<p>windows 中也可以直接打开菜单栏中的 <strong>Python(command line)</strong>。</p>

<p>在提示符后输入代码，回车。程序就会执行这一行代码，并自动输出代码的返回结果。</p>

<h3 id="toc_7">IDLE</h3>

<p>从菜单栏或命令行打开 Python 自带的编辑器 IDLE，会默认打开一个 python shell，同样可以直接调试代码。</p>

<h3 id="toc_8">ipython</h3>

<p>IPython 是一个第三方的 python shell，比默认的 python shell 好用得多，支持变量自动补全、自动缩进，支持 bash shell 命令，内置了许多很有用的功能和函数。</p>

<p>可以通过 <code>pip</code> 安装：</p>

<div><pre><code class="language-none">pip install ipython</code></pre></div>

<h3 id="toc_9">参考资料</h3>

<ol>
<li><a href="https://mp.weixin.qq.com/s?__biz=MjM5MDEyMDk4Mw==&amp;tempkey=1CWwSSQp7vT%2FCPPHiVeCuxn3CFbTxf1Ux5NfmMDCzur57Jj%2F0S9ET5ftgAxMdmiwoHaKTbXyJc6Y3E3eclIKjN5JdtPUQPweybxpVxYk8dA4fU4PdkXF9idvLDZNZXdqjSEfzYLREe6bqPHBHoh%2F1g%3D%3D&amp;#rd" target="_blank">python shell</a></li>
<li><a href="http://www.crifan.com/how_to_do_python_development_under_windows_environment/" target="_blank">【多图详解】如何使用Python Shell（command line模式和GUI模式）- crifan</a></li>
<li><a href="http://ipython.org/index.html" target="_blank">ipython 官网</a></li>
</ol>



<script type="text/javascript">
var _self="undefined"!=typeof window?window:"undefined"!=typeof WorkerGlobalScope&&self instanceof WorkerGlobalScope?self:{},Prism=function(){var e=/\blang(?:uage)?-(\w+)\b/i,t=0,n=_self.Prism={util:{encode:function(e){return e instanceof a?new a(e.type,n.util.encode(e.content),e.alias):"Array"===n.util.type(e)?e.map(n.util.encode):e.replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/\u00a0/g," ")},type:function(e){return Object.prototype.toString.call(e).match(/\[object (\w+)\]/)[1]},objId:function(e){return e.__id||Object.defineProperty(e,"__id",{value:++t}),e.__id},clone:function(e){var t=n.util.type(e);switch(t){case"Object":var a={};for(var r in e)e.hasOwnProperty(r)&&(a[r]=n.util.clone(e[r]));return a;case"Array":return e.map&&e.map(function(e){return n.util.clone(e)})}return e}},languages:{extend:function(e,t){var a=n.util.clone(n.languages[e]);for(var r in t)a[r]=t[r];return a},insertBefore:function(e,t,a,r){r=r||n.languages;var l=r[e];if(2==arguments.length){a=arguments[1];for(var i in a)a.hasOwnProperty(i)&&(l[i]=a[i]);return l}var o={};for(var s in l)if(l.hasOwnProperty(s)){if(s==t)for(var i in a)a.hasOwnProperty(i)&&(o[i]=a[i]);o[s]=l[s]}return n.languages.DFS(n.languages,function(t,n){n===r[e]&&t!=e&&(this[t]=o)}),r[e]=o},DFS:function(e,t,a,r){r=r||{};for(var l in e)e.hasOwnProperty(l)&&(t.call(e,l,e[l],a||l),"Object"!==n.util.type(e[l])||r[n.util.objId(e[l])]?"Array"!==n.util.type(e[l])||r[n.util.objId(e[l])]||(r[n.util.objId(e[l])]=!0,n.languages.DFS(e[l],t,l,r)):(r[n.util.objId(e[l])]=!0,n.languages.DFS(e[l],t,null,r)))}},plugins:{},highlightAll:function(e,t){var a={callback:t,selector:'code[class*="language-"], [class*="language-"] code, code[class*="lang-"], [class*="lang-"] code'};n.hooks.run("before-highlightall",a);for(var r,l=a.elements||document.querySelectorAll(a.selector),i=0;r=l[i++];)n.highlightElement(r,e===!0,a.callback)},highlightElement:function(t,a,r){for(var l,i,o=t;o&&!e.test(o.className);)o=o.parentNode;o&&(l=(o.className.match(e)||[,""])[1],i=n.languages[l]),t.className=t.className.replace(e,"").replace(/\s+/g," ")+" language-"+l,o=t.parentNode,/pre/i.test(o.nodeName)&&(o.className=o.className.replace(e,"").replace(/\s+/g," ")+" language-"+l);var s=t.textContent,u={element:t,language:l,grammar:i,code:s};if(!s||!i)return n.hooks.run("complete",u),void 0;if(n.hooks.run("before-highlight",u),a&&_self.Worker){var c=new Worker(n.filename);c.onmessage=function(e){u.highlightedCode=e.data,n.hooks.run("before-insert",u),u.element.innerHTML=u.highlightedCode,r&&r.call(u.element),n.hooks.run("after-highlight",u),n.hooks.run("complete",u)},c.postMessage(JSON.stringify({language:u.language,code:u.code,immediateClose:!0}))}else u.highlightedCode=n.highlight(u.code,u.grammar,u.language),n.hooks.run("before-insert",u),u.element.innerHTML=u.highlightedCode,r&&r.call(t),n.hooks.run("after-highlight",u),n.hooks.run("complete",u)},highlight:function(e,t,r){var l=n.tokenize(e,t);return a.stringify(n.util.encode(l),r)},tokenize:function(e,t){var a=n.Token,r=[e],l=t.rest;if(l){for(var i in l)t[i]=l[i];delete t.rest}e:for(var i in t)if(t.hasOwnProperty(i)&&t[i]){var o=t[i];o="Array"===n.util.type(o)?o:[o];for(var s=0;s<o.length;++s){var u=o[s],c=u.inside,g=!!u.lookbehind,h=!!u.greedy,f=0,d=u.alias;u=u.pattern||u;for(var p=0;p<r.length;p++){var m=r[p];if(r.length>e.length)break e;if(!(m instanceof a)){u.lastIndex=0;var y=u.exec(m),v=1;if(!y&&h&&p!=r.length-1){var b=r[p+1].matchedStr||r[p+1],k=m+b;if(p<r.length-2&&(k+=r[p+2].matchedStr||r[p+2]),u.lastIndex=0,y=u.exec(k),!y)continue;var w=y.index+(g?y[1].length:0);if(w>=m.length)continue;var _=y.index+y[0].length,P=m.length+b.length;if(v=3,P>=_){if(r[p+1].greedy)continue;v=2,k=k.slice(0,P)}m=k}if(y){g&&(f=y[1].length);var w=y.index+f,y=y[0].slice(f),_=w+y.length,S=m.slice(0,w),O=m.slice(_),j=[p,v];S&&j.push(S);var A=new a(i,c?n.tokenize(y,c):y,d,y,h);j.push(A),O&&j.push(O),Array.prototype.splice.apply(r,j)}}}}}return r},hooks:{all:{},add:function(e,t){var a=n.hooks.all;a[e]=a[e]||[],a[e].push(t)},run:function(e,t){var a=n.hooks.all[e];if(a&&a.length)for(var r,l=0;r=a[l++];)r(t)}}},a=n.Token=function(e,t,n,a,r){this.type=e,this.content=t,this.alias=n,this.matchedStr=a||null,this.greedy=!!r};if(a.stringify=function(e,t,r){if("string"==typeof e)return e;if("Array"===n.util.type(e))return e.map(function(n){return a.stringify(n,t,e)}).join("");var l={type:e.type,content:a.stringify(e.content,t,r),tag:"span",classes:["token",e.type],attributes:{},language:t,parent:r};if("comment"==l.type&&(l.attributes.spellcheck="true"),e.alias){var i="Array"===n.util.type(e.alias)?e.alias:[e.alias];Array.prototype.push.apply(l.classes,i)}n.hooks.run("wrap",l);var o="";for(var s in l.attributes)o+=(o?" ":"")+s+'="'+(l.attributes[s]||"")+'"';return"<"+l.tag+' class="'+l.classes.join(" ")+'" '+o+">"+l.content+"</"+l.tag+">"},!_self.document)return _self.addEventListener?(_self.addEventListener("message",function(e){var t=JSON.parse(e.data),a=t.language,r=t.code,l=t.immediateClose;_self.postMessage(n.highlight(r,n.languages[a],a)),l&&_self.close()},!1),_self.Prism):_self.Prism;var r=document.currentScript||[].slice.call(document.getElementsByTagName("script")).pop();return r&&(n.filename=r.src,document.addEventListener&&!r.hasAttribute("data-manual")&&document.addEventListener("DOMContentLoaded",n.highlightAll)),_self.Prism}();"undefined"!=typeof module&&module.exports&&(module.exports=Prism),"undefined"!=typeof global&&(global.Prism=Prism);
</script>

<script type="text/javascript">
Prism.languages.python={"triple-quoted-string":{pattern:/"""[\s\S]+?"""|'''[\s\S]+?'''/,alias:"string"},comment:{pattern:/(^|[^\\])#.*/,lookbehind:!0},string:/("|')(?:\\?.)*?\1/,"function":{pattern:/((?:^|\s)def[ \t]+)[a-zA-Z_][a-zA-Z0-9_]*(?=\()/g,lookbehind:!0},"class-name":{pattern:/(\bclass\s+)[a-z0-9_]+/i,lookbehind:!0},keyword:/\b(?:as|assert|async|await|break|class|continue|def|del|elif|else|except|exec|finally|for|from|global|if|import|in|is|lambda|pass|print|raise|return|try|while|with|yield)\b/,"boolean":/\b(?:True|False)\b/,number:/\b-?(?:0[bo])?(?:(?:\d|0x[\da-f])[\da-f]*\.?\d*|\.\d+)(?:e[+-]?\d+)?j?\b/i,operator:/[-+%=]=?|!=|\*\*?=?|\/\/?=?|<[<=>]?|>[=>]?|[&|^~]|\b(?:or|and|not)\b/,punctuation:/[{}[\];(),.:]/};
</script>


</body>

</html>
